The Role of Morphology in Generating High-Quality Pronunciation Lexica for Regional Variants of Portuguese
نویسندگان
چکیده
Grapheme to phoneme (GTP) systems for languages such as English, German, and Korean have been shown to achieve better performance rates with the inclusion of a morpho-phonological preprocessing component. While semiautomatic and automatic GTP approaches for Portuguese continue to achieve steady gains, such algorithms do not take morphology into account, despite a growing need to do so, based in part on the recent spelling reform. This paper presents a pilot study in the development of the Portuguese Unisyn Lexicon (LUPo) for assessing the role of morphological information in the generation of high-quality pronunciation lexica for regional variants of Portuguese. Some problematic orthographic contexts are identified, along with the associated difficulties that arise when morphology is left out of the equation. Expanding from known issues that affect Portuguese GTP systems, new orthographic contexts stemming from the recent spelling reform are addressed.
منابع مشابه
A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese
One of the major obstacles in deploying spoken language technologies (SLTs) in the developing world is a lack of key linguistic resources – e.g. electronic dictionaries, phonetically aligned corpora, pronunciation lexicons, etc. – that describe the non-dominant varieties spoken in such countries and regions. In this paper, we describe the work of the LUPo (Portuguese Unisyn Lexicon) project to ...
متن کاملExperimental detection of vowel pronunciation variants in Amharic
The pronunciation lexicon is a fundamental element in an automatic speech transcription system. It associates each lexical entry (usually a grapheme), with one or more phonemic or phone-like forms, the pronunciation variants. Thorough knowledge of the target language is a priori necessary to establish the pronunciation baseforms and variants. The reliance on human expertise can pose difficultie...
متن کاملOn the Pronunciation of Common Lexica and Proper Names in European Portuguese
This paper presents some relevant aspects of the pronunciation of proper names and common lexica in European Portuguese. It starts by a brief description of statistical data concerning the occurrence and distribution of graphemes and phonemes for the two corpora and the distinction between di erent subclasses found in proper names, namely rst and last names, toponyms and acronyms. The central t...
متن کاملMorphological approaches for an English pronunciation lexicon
Most pronunciation lexica for speech synthesis in English take no account of morphology. Here we demonstrate the benefits of including a morphological breakdown in the transcription. These include maintaining consistency, developing the symbol set and providing the environmental description for allophones and phonetic variables. Our approach does not use a full morphological generator, but incl...
متن کاملEvaluating the Link between Word Frequencies and Pronunciation Variants: a Cross-lingual Study on Read and Spontaneous Speech
The aim of this contribution is twofold: evaluating the use of pronunciation variants in read and spontaneous speech and studying the link between word frequencies and pronunciation variants. The dependance of pronunciation variants on a given system connguration is also addressed in the rst part. For the second aspect of this work diierent variant types are deened. A cross-lingual study is car...
متن کامل